-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement gMSA for Windows upstream kubernetes tests #2208
Conversation
Skipping CI for Draft Pull Request. |
4eb2c61
to
2f3c4cb
Compare
Set up in Azure sub is complete |
serial slow test passed and the gMSA tests ran and passed 🥳
|
c73ea82
to
5d9765f
Compare
/test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow |
gMSA tests passed again. /assign @CecileRobertMichon @marosset |
It would be nice if we could figure out a way to not need new templates for to run the gmsa tests. |
In the pr description I called this out. The first commit in the PR demonstrates how this would look without an additional template in the I moved to the template approach for couple reasons:
I could go either way but lean toward having the additional template for the above reasons. |
Not to throw a wrench in current progress, but can the gMSA install pre-reqs be implemented as a helm chart? |
The only component that needs to be installed for the implementation here is the ccg keyvault plugin which is install on the VM via image-builder. The webhook is installed at test run time so not needed here. There is a WIP for a chart if a customer wanted to use it, otherwise it would need to be installed seperately. The ccg plugin is not a part of the chart. |
@jsturtevant sounds like wrapping all of the needful capz stuff into a helm chart for installation via test CI is not worth the effort... |
keyVaultClient.Authorizer = keyvaultAuthorizer | ||
|
||
// Wait for the Cluster nodes to be ready (this is different than capi's ready as cni needs to finish initializing) | ||
windowsCalico := &appsv1.DaemonSet{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is assuming calico CNI, would we ever want it run with other CNIs? Would it be equivalent to check the nodes are all "Ready"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this before I learned about: ControlPlaneWaiters: clusterctl.ControlPlaneWaiters{
Maybe that would be a better approach? It might help with some flakes that happen before the Windows pods are fully ready too.
GMSA_DOMAIN_ENVSUBST="${REPO_ROOT}/scripts/gmsa/domain.init" | ||
GMSA_DOMAIN_FILE="${REPO_ROOT}/scripts/gmsa/domain.init.tmpl" | ||
$ENVSUBST < "$GMSA_DOMAIN_FILE" > "$GMSA_DOMAIN_ENVSUBST" | ||
az vm create -l "$AZURE_LOCATION" -g "$GMSA_NODE_RG" -n "$vmname" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you consider using the go sdk to run these prerequisites directly in the test suite instead of using the az cli in a script? similar to what we do for private cluster custom vnet setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did, having the creation of the domain outside the test suite made it so it could be used across test entry point scripts or by someone outside the project for their testing. This still requires a few additional set before being to run tests so maybe bringing it in to test suite would be fine now.
The other aspect of this, is that testing the domain creation was cumbersome and having it out side the test suite made it easier to iterate on without having to modify the rest of the tests to be skipped why working on it.
I listed a bunch ideas in #1860 (comment)
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
spec: | ||
identity: UserAssigned | ||
userAssignedIdentities: | ||
- providerID: "/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${CI_RG}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/cloud-provider-user-identity" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this require the fix in #2214 to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had similar question but after discussing with @mboersma we figured out that since we are not testing any cloudprovider specific flows in this set of tests it doesn't exercise the code that would cause issues.
"${REPO_ROOT}/hack/log/redact.sh" || true | ||
} | ||
|
||
trap cleanup EXIT | ||
|
||
if [[ "${WINDOWS}" == "true" ]]; then | ||
if [[ $KUBETEST_WINDOWS_CONFIG =~ "windows-serial-slow" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this bit documented anywhere?
I can image someone having a bad day trying to troubleshoot why GMSA tests aren't running correctly somewhere in PROW only to find this conditional...
Maybe we could at least add a log line like 'Skipping GMSA configuration' if we aren't performing the config to help debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't. We currently run the GMSA tests in the serial/slow jobs. There isn't a strict requirement for this and in fact the tests are pretty fast, just to initial set up of the cluster and domain is slow.
I used this because i didn't want to introduce yet another ENV but maybe it would be better to have it as additional setup? It would make your suggestion and debugging simpler.
Had discussion in slack and zoom to discuss the following in bit more in detail:
Summary was:
Also discussed how to make this as re-usable as possible for other providers but so much of the setup is azure specific there may not be much. One possible idea was to move some of the testing templates and scripts to live in the sig-windows gmsa repo along with maybe using the outcome of az capi cli updates to create the clusters: |
@jsturtevant: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
most of this work has been moved to kubernetes-sigs/windows-testing#328 /close |
@jsturtevant: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This added the need infrastructure and setup required to test gMSA for windows. See details in #1860.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #1860
Special notes for your reviewer:
This requires one time set up in the subscription using the script provided
scripts/gmsa/gmsa-setup.sh
which configures some required managed identities and access to key-vault resources. This requires special privileges (Microsoft.Authorization/roleassignments/write
) to configure access.Other requirements:
cncf-upstream:capi-windows:k8s-1dot23dot5-windows-2022-containerd:2022.03.29
andcncf-upstream:capi-windows:k8s-1dot23dot5-windows-2019-containerd:2022.03.30
Initially tried to implement this with out introducing a new template but felt it would be closer to a customer experience if we had a new template. I left the commits for now but can re-base once we get through reviews.
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
Release note: